Keep Document Parts (Text Processing)

Synopsis

Extracts the text of a token that matches a given regular expression and returns it.

Description

This operator allows to extract a part of a token using regular expressions. It searches the first region within the text that matches the given regular expression and returns this region as new token. If no such region can be found this token is discarded. Since this probably will work best when the tokens are long enough, this operator is especially useful before the actual tokenization is applied during word vector creation.

Input

document
The document port.

Output

document
The document port.

Parameters

extraction_regexThis regular expression specifies the part of the string, which is extracted and returned. Range:

Categories

Versions